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ABSTRACT 

We describe an algorithm for constructing TV-body realisations of equilibrium stellar 
systems. The algorithm complements existing orbit-based modelling techniques using 
linear programming or other optimization algorithms. The equilibria are constructed 
by integrating an iV-body system while slowly adjusting the masses of the particles 
until the time-averaged density field and other observables converge to a prescribed 
value. The procedure can be arranged to maximise a linear combination of the entropy 
of the system and the % 2 statistic for the observables. The equilibria so produced may 
be useful as initial conditions for TV-body simulations or for modelling observations of 
individual galaxies. 
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1 INTRODUCTION 

One of the central problems in stellar dynamics is to con- 
struct made-to-measure stellar systems. For example, (i) 
when modelling observations of an elliptical galaxy we wish 
to find a phase-space distribution function /(r, v) (here- 
after DF) that solves Poisson's equation and the collisionless 
Boltzman equation and reproduces the observed surface- 
brightness distribution, rotation curve, velocity-dispersion 
profile, etc. (in the sense of minimising x 2 , the mean-square 
deviation between the observations and the model); (ii) 
when conducting simulations we wish to construct initial 
states that are iV-body realisations of equilibrium stellar 
systems with given density profile, rotation curve, bulge/disc 
ratio, etc. 

Existing methods for constructing made-to-meaure stel- 
lar systems can be classified as follows: 

• DF-based methods, which solve directly for the DF 
/(r, v). These generally require that all the integrals 
of motion are known explicitly (e.g. if the potential is 
spherical or has Stackel form) or that the DF is assumed 
to depend only on known analytic integrals. These in- 
clude methods that fit observations to few-parameter 
models with analytic DFs such as spherical Michie or 
King models; a difficulty with such methods is that the 
dependence of the derived properties of the galaxy on 
the choice of the model can be large and uncertain (e.g. 
Merritt & Tremblay 1994). A more flexible DF-based 
method is described by Dejonghe (1989), who expands 
the DF for a spherical system in a truncated series of 
basis functions and minimises x 2 subject to the con- 
straint that the DF is positive, using quadratic pro- 
gramming. DF-based methods can be applied to ax- 
isymmetric galaxies, and are completely general so long 



as the DF does not depend on a third integral (e.g. Qian 
etal. 1995, Kuijken 1995). 

• Moment-based methods, which find solutions of the 
Jeans equations (or higher-order velocity moments of 
the collisionless Boltzmann equation) that minimise \ 2 ■ 
This method was used by Binney & Mamon (1982) to 
construct spherical models of M87 and has been applied 
to axisymmetric models by Binney et al. (1990), van der 
Marel et al. (1994), and Magorrian & Binney (1994). A 
drawback is that this procedure does not guarantee that 
there is a positive-definite DF with the required velocity 
moments. 

• Orbit-based methods (Schwarzschild 1979, Schwarz- 
schild 1993) compute the density distribution of a large 
library of orbits in a fixed potential, and then deter- 
mine the weight each orbit must have in order to re- 
produce the desired final state. In this method the DF 
or other integrals of motion are not explicitly required, 
although the DF can be regarded as a sum of delta- 
functions on the phase-space surfaces covered by the or- 
bits. Orbit-based methods are generally ill-conditioned. 
The ill-conditioning can be removed by iterating from 
a smooth initial guess for the orbit weights using the 
Richardson-Lucy method (Newton & Binney 1984), but 
the final weights will then depend in a complicated way 
on the initial guess. A better procedure is to minimise 
X 2 minus a profit function that measures the smooth- 
ness of the distribution of orbit weights. The profit func- 
tion may be an entropy (Richstone & Tremaine 1988) or 
any other function that is large when the DF is smooth 
(Merritt 1993). 

The most flexible of these are orbit-based methods, since 
they do not require that the integrals of motion are known 
and the approximation they provide to the DF is known to 
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be positive if the orbit weights are positive. 

The goal of this paper is to introduce a novel class 
of methods for constructing made-to-measure iV-body re- 
alisations of stellar systems. In the classification above, 
these methods might be called 'particle-based'; they work 
by sculpting an initial iV-body system until it matches the 
prescribed density field and other observables. 



2 THE ALGORITHM 

In most of our discussion we restrict ourselves to construct- 
ing stellar systems in a fixed potential $(r). We suppose that 
$ admits a collisionless equilibrium configuration specified 
by a distribution function in phase space /(r, v). Thus / 
satisfies the time-independent collisionless Boltzmann equa- 
tion: 

dr <9v 

An 'observable' of the stellar system is a quantity of the 
form 



equations with noisy data. 

Some insight into the solutions of equation (4) is offered 
by the following argument. Since e is small the weights Wi 
change only over many orbits; thus we may orbit average: 



dwi(t) 
dt 



-ewi(t) 



j 

£ 

i=i 



(Kji) 



(A, 



(5) 



Here (•) denotes the time-average over an interval that is 
much longer than a typical orbital period but much less than 
e _1 periods, and (Kji) is shorthand for the time-independent 
quantity (K 3 ;[z;(t)]). We have also assumed that the fluctu- 
ations in Kj[zi(t)] and Aj(t) are not correlated, which is 
plausible if many particles contribute to A j . We now define 



/ Ft \ 1/2 

e fc (t) = (-g) (A fc > 

which obeys the difl 
_ = _ e ^ Wi(t) 



0. 



(1) 



which obeys the differential equation 



J K > 



(6) 



(7) 



(«)/(«) d 6 z, 



(2) 



where z = (r, v) and Kj is a known kernel. Suitable ob- 
servables include the surface or volume density at a given 
point, the surface density times the mean line-of-sight veloc- 
ity, the surface density times any moment of the line-of-sight 
velocity, etc. 

Now consider a system of N particles having weights Wi 
and phase-space positions Zj(i) (i < N). The observables of 
this system are 



When we are close to convergence (|®| -C I), the behaviour 
of the right side is dominated by changes in Oj rather than 
changes in w t , so we may replace Wi by a constant, w°. Then 
the vector © satisfies the matrix equation 



d& 

dt 



-eA 0, 



where the matrix A has components 



i-kj 



= £ 



w° 



yj 



(3) 
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The solutions to equation (8) have the form 



Our goal is the following: given a set of distinct observables 



Yj, j = I, . . . , J, construct a system of N particles orbiting in Ok = a,km exp(— A m i), 



(8) 



(9) 



(10) 



the potential $(r) whose time-averaged observables (yj(t)) 
are equal to Yj . 

We hope that this paper will stimulate interest in seek- 
ing better particle-based algorithms than the one we de- 
scribe. 

2.1 The force of change 

The heart of the algorithm is a prescription for changing the 
weights {wi(t)} as the particles proceed along their fixed or- 
bits in the potential ( I > (r). The prescription is similar to that 
employed by Syer & Tremaine (1995) in a different context. 
It consists of applying gentle pressure on Wi according to 
the value of Aj = yj(t)/Yj — 1: if Aj < then increase u>;, 
and if Aj > then decrease w». More precisely we let 



where the eigenvalues {A m } are solutions of the equation 



det(eA - AI) = 0. 



(11) 



dwi(t) 
dt 



Kj[zi(t)] 



Since A is positive-definite by construction (i.e. x' ■ A ■ x > 
for all x), all of its eigenvalues are positive so A m > 0. 
This argument suggests that all observables converge to the 
desired values (|(A)| — > 0) on 0(e _1 ) orbital periods, if e 
is sufficiently small and we start close to the correct final 
state. 

For comparison, orbit-based methods evaluate and store 
the entire matrix (Kji) (by following the orbits for a fixed 
time that is much longer than the orbital period; the matrix 
(K) is often called the "orbit library"), then solve the matrix 
equation 



A,(i). 



(4) 



Yj = Y J (K ji )w i ; 



(12) 



where e is small and positive, and Z is so far arbitrary. The 
factor Wi on the right side ensures that dwi/dt — » as w» — > 
so that u>i cannot become negative. The factor Kj/Zj 
ensures that the difference Aj changes the weight Wi only 
if particle i is contributing to observable j. Equation (4) is 
closely related to Lucy's (1974) method for solving integral 



if iV > J the matrix equation is ill-conditioned and must be 
solved subject to a constraint that maximizes some profit 
function such as entropy. The storage needed by particle- 
based methods is O(NJ) whereas the storage needed by 
particle-based methods is only O(N). 
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2.2 The kernel 

Suppose that the observable y r is the density at r. What is 
the appropriate kernel K r (r' ,v')7 One approach would be 
to use a smooth kernel (a la SPH) with 



K T {r',V) = W(r-v',h), 



(13) 



where W is some smooth function, and h is the resolution 
length (possibly a function of r); W should be normalised 
so that J W(r, h) d 3 r = 1. If h is too large resolution is lost, 
while if h is too small the observables yj fluctuate strongly 
because too few particles contribute to a given observable. 

The smooth-kernel approach is expensive because near- 
est neighbours have to be found at each timestep and we 
have therefore adopted a different method. We first divide 
the coordinate space into bins. Then we set K T (r',v') to 
zero if r and r' are not in the same bin, and equal to the 
inverse of the volume of the bin otherwise. Obviously K T1 
and K T2 are the same if ri and r2 are in the same bin, so 
there can be at most one density observable per bin. We are 
still free to choose the parameter Zj in (4); a simple choice 
is to set Zj equal to the inverse of the volume of the bin. 
Thus Kj(zi)/Zj is unity if Vi is in bin j, and zero otherwise. 

An improvement to this simple scheme, which we em- 
ploy here, is to borrow from the smooth-kernel approach and 
to smear each particle into neighbouring bins. Each parti- 
cle is replaced by a gaussian distribution at the bin centre 
closest to the particle position, with dispersion equal to half 
the bin width. Neighbouring bins are assigned a weight cor- 
responding to the integral of the gaussian over those bins 
(normalized so that the total contribution over all bins is 
Wi). The bin closest to the particle position thus contains 
about two thirds of its weight. This procedure corresponds 
nicely to the phenomenon of seeing in the case of a projected 
observable. 

2.3 Resolution and smoothing 

Suppose that we divide the d-dimensional coordinate space 
into M d bins, and each density observable corresponds to 
the mass per unit volume in a single bin. Then at any given 
time there are on average N/M d particles contributing to 
each observable, and the rms statistical fluctuations in Aj 
will be of order 6 ~ (M d /N) 1/2 . These fluctuations can be 
kept small if d = 1 (spherical or one-dimensional systems); 
for example if M = 30 and N = 1000 we have S ~ 0.2. 
However, for triaxial systems (d — 3) the fluctuations will 
be much larger for reasonable values of M and N. 

To improve this situation we employ a form of tempo- 
ral smoothing which effectively boosts N without any need 
for extra storage or computation per time step. This is im- 
plemented by replacing Aj(t) in equation (4) with Aj(t), 
where 



(14) 



Aj(t) = a / A 3 (t-r)e" aT dr, 



and a is small and positive. This quantity is most easily 
calculated using the equivalent differential equation 



(15) 



dA 

dt 

Each particle is smeared backwards along its trajectory and 
represents a set of virtual or ghost particles strung out along 
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Figure 1. The weights {«>;} as a function of time for simulation 
A. 

the orbit with ever decreasing weights. In effect, temporal 
smoothing increases the effective number of particles from 
N to 



N eS = N 



ti 
g 

At' 



(16) 



where At is the timestep and ti = (In 2) /a is the half-life 
of the ghost particles. 

Some insight into the effect of this smoothing procedure 
is given by the following argument. As in Section 2.1, let (•) 
denote the time-average over an interval that is much longer 
than a typical orbital period but much less than a -1 . The 
time-averaged version of equation (15) is 



d(A) - =a«A> - (A)) or a(e-§), 



dt 



(17) 



where © is defined by replacing A by A in (6). The evolu- 
tion of © is described by equation (8), with © replaced by 
© on the right side. The solution to equations (8) and (17) 
has the form (10), with eigenvalues A that satisfy 



det(eA- AI + A 2 ^ 1 !) =0. 



(18) 



In the simple case of a single observable, we have A = 
\a ± \(c? - 4aeA) 1/2 , where A > 0. For a > eA, we find 
A ~ eA, which is the same convergence rate that would ob- 
tain without temporal smoothing. However, for a < 4eA, A 
is complex, so the observables execute damped oscillations 
rather than converging smoothly, and the convergence rate 
is |Re(a) which is slower than the convergence rate without 
smoothing whenever a < 2eA. 

We conclude that excessive temporal smoothing is un- 
desirable and that the maximum smoothing time aT l should 
satisfy 

e < a, (19) 
assuming |A| = 0(1). 

2.4 Maximum entropy 

If the number of particles exceeds the number of observ- 
ables, the differential equations (4) are ill-conditioned. In 
practice this means that the observables {yj{t)} will con- 
verge fairly rapidly to {Yj}, but that the individual particle 
weights {wi } will continue to drift long after the observables 
have converged. Such behaviour is undesirable as we would 
like to use the particle weights to predict other properties of 
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the stationary stellar system. 

To remove the ill-conditioning, we can maximise some 
form of profit function, such as the entropy 



Wi \og(wi/mi), 



(20) 



where {riij} is a pre-determined set of weights (the 'prior'). 
Thus we maximise the function 



F = fiS 
where 



\x 2 



X 



-j XX 

2 = 1 



Equation (4) is replaced by 

dwi(t) 



dt 



= tWi(t) 



J 



2 = 1 



(21) 



(22) 



(23) 



The factor /u is a measure of the relative contribution of 
X 2 and S to the final state: if /i is large we get a smooth so- 
lution (in the sense that the {wi} are close to the {rrii}) but 
a large x' ' ■> while if /i is small the solution is not smooth but 
X 2 is likely to be smaller. The parameter /i can be specified 
at the start of the calculation, or adjusted as the calculation 
proceeds using a prescription such as 



m [D 2 



(24) 



dfi(t) 
dt 

with < r\ <C 1; in this case x 2 will converge to the specified 
value D 2 . In the simulations described here we have kept /i 
constant. 

The condition N > J (number of particles exceeds 
the number of bins) is neither a strict criterion for ill- 
conditioning nor a necessary condition for a sensible result, 
since different observables are not independent (both be- 
cause of the spatial smoothing described in Section 2.2 and 
because a single orbit contributes to many observables) . 



3 RESULTS 

3.1 One-dimensional results 

We first present the results of experiments in one dimen- 
sion, to illustrate the effects of the various parameters in 
the algorithm. We use for the background potential 

1 



*(a;) 



(fc 2 



2)1/2 



(25) 



with b = | , and for the observables we use the density dis- 



tribution 



(52 + £2)5/2- 



(26) 



The period of a low-energy orbit is 27r6 3//2 = 1.209. We use 
a fourth-order leapfrog with timestep At = 0.1 to integrate 
the particle orbits, and adjust the weights {wi} after every 
timestep according to equation (23). To measure the density 
we use a uniform grid with 16 bins in the range x £ [—1,1]. 
With b — | there is a density contrast of about 200 between 
the inner and outermost bins. We evaluate the success of 




Figure 2. The rms fractional density error \ (eq. 22) for simula- 
tion A (bottom curve), and simulation B (top curve). Simulation 
B has no orbit averaging. The initial state has small x 2 , but is not 
in equilibrium, so x 2 immediately grows as soon as the particles 
start to move. 
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Figure 3. The density error \ for simulations A (top curve), C 
(middle curve) and D (bottom curve). The initial state has small 
X 2 , but iis not in equilibrium, so x 2 immediately grows as soon 
as the particles start to move. 

Table 1. The parameter values used in the simulations. 





N 


10 2 a 


10 2 e 


10 2 M 


D 


A 


100 


5.24 


2.5 


1 


0.05 


B 


100 


oo 


2.5 


1 


0.05 


C 


100 


5.24 


500 


1 


0.05 


D 


100 


5.24 


500 







E 


1000 


51.9 


25 


1 


0.05 


3D 


4000 


5.24 


2.5 


0.1 


0.05 



the algorithm by examining the time evolution of the {wt} 
and of x 2 - The parameter values we use are summarised 
in Table 1. In the initial state the particles were uniformly 
distributed in a; £ [— 1, 1] with equal weights. The veloci- 
ties were uniformly distributed in the range allowed by the 
condition that their orbits are restricted to a; € [—1, 1]. 

Figure 1 shows the evolution of the weights {wi} for 
simulation A, in which a = 0.0524, e = 0.025 and ji = 0.01. 
All the {wt} evolve smoothly, and converge in a time of 
order e _1 . We note that the final equilibrium has a wide 
range of masses, which suggests that some gain in efficiency 
might be possible if the particles with smaller w could be 
replaced with a smaller number of particles with a larger 
w (for example, we could discard particles whose weights 
fall below a threshold and replace them with new particles 
on randomly chosen orbits, or combine low-weight particles 
with nearby orbits). 
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Figure 4. A representative sample of 6 of the {«>;} for each of the simulations A (top left), B (top right), C (bottom left) and D (bottom 
right). 



0.5 

0.2 
0.1 

X 

0.05 
0.02 








25 



50 



75 

t 



100 125 150 



Figure 5. The density error \ for simulations E and A (with the 
time coordinate for A shrunk by a factor of ten to facilitate the 
comparison). 

Figure 2 shows the evolution of \ 2 i n simulation A and 
compares with simulation B in which the orbit averaging is 
removed (a = oo). Both the final value of \ 2 an d its frac- 
tional fluctuation are smaller in A, illustrating the benefits of 
orbit-averaging. Figure 3 shows \ 2 in simulations A, C and 
D. Simulations C and A differ only in the parameter e: C 
has larger e and hence converges more rapidly, although the 
final states in A and C are very similar. Simulation D and 
C differ only in the parameter fi: D has fi — (no entropy 
constraint). The final value of x 2 is lower in simulation D — 
and the fluctuations are larger — since the simulation does 
not have to trade higher \ 2 for lower entropy. 

Figure 4 shows the evolution of the same 6 particles 
from simulations A, B, C and D. Comparing A (e = 0.48a) 
with C and D (e = 95a), we see the effect of excessive tem- 
poral smoothing (equation 19): the trajectories w» are noisy 
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Figure 6. As in Figure 4 but for simulation E. 

and do not converge smoothly to their final values. Compar- 
ing simulations C and D we see that this noise is worse for 
small w when the entropy constraint is removed. Note that 
the larger w are not much different in C and D. The smaller 
w do not converge in D, but they do not contribute much to 
X 2 - In C the entropy constraint has tied down the smaller 
values of w. 

Figure 5 compares the effects of orbit averaging and par- 
ticle number. Simulation E has 10 times the number of par- 
ticles as A, but the smoothing time a -1 is 10 times smaller; 
as a consequence the noise in the two simulations is about 
the same. The shorter smoothing time in E allows a larger 
e so that E converges faster than A; note that the time co- 
ordinate in Figure 5 has been shrunk by a factor of 10 for 
simulation A. Thus there is a tradeoff between number of 
particles and convergence time; simulations A and E each 
take about 2 minutes of CPU on a DEC Alpha 3000/300. 
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Figure 6 shows the same 6 particle weights as in Figure 4. 



3.2 Three-dimensional results 

We have performed a number of three-dimensional simula- 
tions. The parameters for the algorithm are in each case the 
same, as given in the last row of Table 1. 



3.2.1 Mass models 

The mass models were of three types: 

PS Plummer sphere, with density law given by equation 

(26), with x denoting the radius and AirG = 3b 2 . We 

denote this model 'PS'. 
PT Triaxial Plummer model with density law given by 

equation (26) with x denoting the triaxial radius, s, 

defined by 



2 2 2 

2 X V Z 

A 2 B 2 C 2 ' 



(27) 



and (A, B, C) = (1.41, 1.12, 1.00). 
ST Schwarzschild's (1979) model with aixs ratios given by 
(A,B,C) = (2.00, 1.25, 1.00). 

3.2.2 Force calculation 

The force calculation is carried out in one of two ways: 
A Analytically. In the case of PT the analytic potential is 
not that which would be self-consistently generated by 
the mass model, but rather the Plummer potential with 
s replacing x in equation (25) . 
F Numerically using the Fourier Convolution Theorem 
and FFT on a 16 a grid. Mass is assigned to the grid 
from the desired mass model, and the forces and po- 
tential are calculated once only at the beginning of the 
simulation. 

In the ST models the numerical and analytic potentials are 
significantly different (40% on average) because the model 
has inifinite mass and has been truncated at finite radius. 
In the PS models the forces agree to within a few percent. 

3.2.3 Initial condition 

We also use three different types of initial condition: 
B Particles uniformly distributed in r < 1, with velocities 
uniform in the range allowed by the condition that their 
orbits remain in r < 1. In the Schwarzschild potential 
this initial condition produces a preponderence of box 
orbits, hence we refer to it as 'box-dominated'. 
T Particles uniformly distributed in r < 1. Additionally 
20% have their velocities chosen as for B, and 80% 
are given velocities perpendicular to their radius vec- 
tor with magnitude equal to that of a circular orbit in 
a spherical potential with the same total force. In the 
Schwarzschild potential this initial condition produces 
more tube orbits than B, hence we refer to it as 'tube- 
dominated'. 

E Particles distributed in r < 1 according to the required 
mass model. Thus there are extra particles in the inner 
regions as compared with B or T. Velocities chosen as 
for T. 

To summarise the notation by example, simulation PSAB is 
a Plummer sphere with analytic forces, and box-dominated 



Table 2. The results of the three-dimensional 
simulations. Column 1 is the simulation type; 
column 2 the initial percentage by mass of tube 
orbits (as defined in the text); column 3 the 
final percentage by mass; column 4 the final 
percentage by number; and column 5 the final 
value of x- 
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PSAE 
PSFT 
PTAT 
PTFT 
PTFE 
STAT 
STAE 
STFB 
STFT 
STFE 
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Figure 7. A scatter plot of L m ax versus L m i n for simulation 
PTFT (only 10% of the total number of particles are shown). 
The values of L plotted are relative to the average magnitude of 
the angular momentum of each particle. 




Figure 8(a). The density along the principal axes in simulation 
PTFE showing the target mass model (solid) and the final state 
observed distribution (diamonds). 

initial condition; simulation STFE is a Schwarzschild triaxial 
model with Fourier numerical forces, and extra sampling of 
the inner regions. 

In each case we measure the density using a uniform 
cubical grid with M — 16 elements on a side for a total of 
2 12 observables. The core radius of alll the models b = |, 
and we restrict particles to orbits with r < 1. Simulation 
PSAB took about 200 minutes of CPU on the same ma- 
chine as simulation A, a factor of ~ 100 longer; most of 
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r 

Figure 8(b). As in Figure 8(a) for simulation STFE. 

this factor reflects the factor of 40 increase in the number of 
particles, while the rest reflects the increased complexity of 
the differential equations (which must follow 6 phase-space 
coordinates instead of 2) and increased access time to the 
much larger array which represents the observables. 

Orbit classification is carried out by recording the maxi- 
mum and mininum values of the angular momentum L about 
the z-axis (the shortest principal axis of the triaxial models) . 
Box orbits tend to reverse direction and 'retrace' their paths, 
so I/max — — Lmin- Figure 7 shows the distribution of L m ax 
and I/min for simulation PTFT. The box orbits essentially 
lie in the upper left quadrant. Tube orbits are here defined 
as those with L ma x£ m in > 0. Table 2 gives the percentage 
of tube orbits in each of the three-dimensional simulations. 

With the exception of STFB all simulations converge 
well: none give final values of % that are more than a factor 
of 2 different from the preferred value D — 0.05. Note that 
the initial condition B, although it works fine for the Plum- 
mer models, fails to produce enough tube orbits to support 
the Schwarzschild potential, and hence the final value of \ 
is higher than the preferred value D = 0.05. The algorithm 
converges, but it is forced to substitute smoothness for ac- 
curacy because it has too few tube orbits. Also note that the 
various realisations of the Schwarzschild model have differ- 
ent final weights of tube orbits. This is not surprising as it is 
expected that the degeneracy in a model which only matches 
the volume density might lead to a variety of box/tube mix- 
tures. 

The behaviour of the weights {wi(t)} and of \ as a 
function of time look very similar to the one-dimensional 
case and they are not reproduced here. In Figure 8 we show 
the density along the principal axes in two of the simulations. 

4 DISCUSSION 

4.1 Choosing the parameters 

How should the parameters, (N, M, a, e, fi) be chosen to op- 
timise the investment in CPU? The arguments given in Sec- 
tion 2 lead to the following guidelines. 

First choose D, the value of x which is as large as can be 
tolerated — in the simulations above we had \ of a few times 
10~ 2 . Then choose the resolution required, via the number 
of resolution elements M d . This informs our choice of N and 
a through (cf. equation 16) 

2 M d M d A 

* ~ ~ — aAt - (28) 



Thus, if N is limited by storage requirements, equation (28) 
determines the desired value of a. 

Once we have a then we must choose e smaller than a 
(equation 19), bearing in mind that the convergence time 
will be of order (T orbital periods. In the simulations we 
found that e ~ 0.5a is about right. 

The final step is to choose fi. This is largely a matter 
of taste since fi determines the balance between smoothness 
and accuracy (Merritt & Tremblay 1992 discuss this issue 
in another context). The natural choice for fi is the one for 
which x 2 reflects the observational errors; and this is the 
value which will be obtained if n is allowed to vary according 
to equation (24). We do not recommend equation (24) for 
general use however because convergence of p can be rather 
slow. One method, which has the advantage of being at least 
semi-quantitative, is to do a run including equation (24), 
with D equal to the preferred value of \- F ar from having 
to converge, it only has to get jx into the right ballpark, and 
then we go back to constant fi for an extended run. 

4.2 The initial condition and prior 

The initial condition of the particles should be chosen to 
sample phase space as well as possible. Considerable effort 
is sometimes devoted to this choice in orbit-based calcu- 
lations. In the simulations presented here we have chosen 
simply to sample phase-space uniformly (Section 3). With 
detailed knowledge of the potential and its orbit families, 
more informed choices could be made. 

We can think of the prior {m,}, together with the parti- 
cle positions and velocities, as a random realisation of some 
known DF /o(r, v). In the simulations presented here we use 
{rrii} which are all equal. In those where the initial particle 
positions and velocities sample phase space uniformly, for 
example, we are effectively using /o which is initially equal 
to a constant. If the initial condition is well mixed, as we 
may reasonably hope, then /o is independent of time. The 
smoothness constraint has the effect of driving the DF of 
the system towards fo- 

With a suitable choice of prior {m;}, the final equilib- 
rium should not depend on the initial condition. It should 
merely reflect the choice of n, which determines the balance 
between smoothness and accuracy. The resolution of the fi- 
nal equilibrium in phase space, however, may be affected by 
the choice of initial condition. The wide range of weights in 
Figure 1 is a reflection of this fact. However, only perfect 
knowledge of the target DF would allow one to set up an 
initial condition which led to an equilibrium with all the 
{wi} equal. This problem could be alleviated 'on the fly' by 
methods that kill particles with low weights and split par- 
ticles with high weights into several daughter particles with 
similar orbits. 

4.3 Comparison with orbit-based methods 

The particle-based method we have described here has sev- 
eral advantages over orbit-based methods. 

• Particle-based methods use less storage: if there are N 
particles (or orbits) and J < N observables, orbit-based 
methods must store O(NJ) variables (the contribution 
of each orbit to each observable), while a particle-based 
method stores only O(N) variables (the particle weights 
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at a given time) . This advantage is particularly impor- This paper has been produced using the Royal Astronomical 
tant in systems with a large number of observables (tri- Society/Blackwell Science TgX macros, 
axial systems, or systems in which the entire line-of- 
sight velocity distribution is observed). 

• Although we have only discussed constructing stellar 
systems in a fixed potential, it should be possible to 
generalise particle-based methods so that the potential 
is determined self-consistently by the particles. Perhaps 
the best approach would be to expand the potential as 
a linear combination of a set of basis functions (which 
can be chosen to preserve any desired symmetries, e.g. 
spherical symmetry). The coefficient of each basis func- 
tion, determined from the evolving weights of the parti- 
cles, could be orbit averaged (cf. equation 14) to reduce 
the effects of relaxation. 

• Model construction with orbit-based methods is a mul- 
tistep process: first compute the luminosity density 
from the surface brightness; then solve Poisson's equa- 
tion assuming (say) constant mass-to-light ratio; then 
integrate orbits in this potential to construct an or- 
bit library (the matrix containing the contribution of 
each orbit to each observable); then use some inver- 
sion/optimization method such as maximum entropy, 
Lucy's method, or linear or quadratic programming 
to determine the orbit weights. In a self-consistent 
particle-based methods all of these steps could be done 
at the same time. 

Particle-based methods also have disadvantages com- 
pared to orbit-based methods: they use more computing cy- 
cles per orbit because the orbits must be followed for a longer 
time; as in all Monte Carlo methods, accuracy only scales 
as iV 1//2 ; poorly chosen values of parameters such as a and 
e can cripple the method. 

The simple experiments we have described show that 
particle-based methods might be able to compete with orbit- 
based methods. Possible improvements include: (i) non- 
uniform spatial grids so that the resolution is highest near 
the centre of the galaxy; (ii) methods that kill particles with 
low weights and split particles with high weights into several 
daughter particles with similar orbits; (iii) gridless methods 
based on smooth kernels (Section 2.2); (iv) determining the 
potential self-consistently from the gravitational field of the 
particles. 
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